Name: A random sample of Wake County, North Carolina residential real estate plots Type: Random Sample Size: N = 100, 11 variables Descriptive Abstract: The information for this data set was taken from a Wake County, North Carolina real estate database. Wake County is home to the capital of North Carolina, Raleigh, and to Cary. These cities are the fifteenth and eighth fastest growing counties in the USA respectively, helping Wake County become the ninth fastest growing county in the country. Wake County boasts a 31.18% growth in population since 2000, with a population of approximately 823,345 residents currently. This data includes 100 randomly selected residential properties in the Wake County registry denoted by their real estate ID number. For each selected property, 11 variables are recorded. These variables include year built, square feet, adjusted land value, address, et al. Sources: Wake County, via http://services.wakegov.com/realestate/, on 3-25-08 Variable Descriptions: ID # - the county-given identification number for the selected plot Year Built - the listed year in which the structure was built (by year) Sq. Ft. - the area of the floor plan in square feet (in square feet) Story - how many stories the structure has (in stories) Acres - how many acres in included in the plot (in acres) No. Baths - the number of bathrooms at the residence (in bathrooms) Fireplaces - the number of fireplaces in the residence (in fireplaces) Total $ - the total assessed value of the property (in dollars) Land $ - the assessed value of the land (in dollars) Building $ - the assessed value of the building (in dollars) Zip - the zip code of the property Empty cells represent a value not included in the property record Story Behind the Data: With Wake County being nationally ranked for its growth over recent years, the size and scale of the databases with public data on the properties is becoming more readily available. These databases are utilized by Dr. Woodard in one of the courses he teaches through a CAUSEweb.org activity because of the information that can be obtained and used for correlation analysis such as the many variables listed above. This data was collected as a tool to show and compare results from students data sets collected in the same manor.

Special Notes: This data set was not compiled using the first 100 randomly obtained real estate identification numbers. Approximately 140 numbers were tried in order to obtain this set of 100, while the ones not included were either non-residential plots or were records that do not exist. The real estate ID numbers varied between approximately 1 and 200000, which were randomly generated using Microsoft Excel. All the data were found on the Wake County website, and were not altered in any way. There is an activity posted on CAUSEweb.org by Dr. Woodard in which students would collect their own version of this data set. A PDF version of this activity can be located at http://www.causeweb.org/repository/Realestate/Realestate.pdf Pedagogical Notes: The most prevalent statistical characteristic of this data is the presence of a natural outlier. The value in particular is real estate ID number 78570. This property is an outlier in two ways that can be easily determined graphically in order to help the students visualize the affect an outlier has on regression lines. It includes 39.38 acres while no other entry has more than 2 acres. The amount of acreage causes the land values and total values to increase over 4.75 million dollars, much larger than the rest of the values of other plots. Students can use this outlier to examine the impact of an outlier on regression and on correlation. Also, the students can be asked to identify the reason or reasons why this entry is an outlier. Of course regression analysis can be used to determine which of the variables are good predictors of total value (simple linear regression). Students can be asked to graph variables against total value; for example, to graph square feet versus total value to examine the correlation coefficient and the model of the regression equation for comparisons to the others. Multiple regression can be used to investigate which sets of variables are good predictors of total value; for example Year Built, Sq. Ft. and Land $ do quite well when the million dollar homes are removed. References: http://services.wakegov.com/realestate/, on 11-2-08 Submitted By: Dr. Roger Woodard Professor/Head of North Carolina State University Undergraduate Dept. Jason Leone NCSU Junior in Statistics jtleone@ncsu.edu